An Audio-Visual Imposture Scenario by Talking Face Animation
نویسندگان
چکیده
With the start of the appearance of PDA’s, handheld PC’s, and mobile telephones that use biometric recognition for user authentication, there is higher demand for automatic non-intrusive voice and face speaker verification systems. Such systems can be embedded in mobile devices to allow biometrically recognized users to sign and send data electronically, and to give their telephone conversation a legal value. The European project “Secure Contracts Signed by Mobile Phone” (SecurePhone) aims at developing such technology on a 3G/B3G enabled PDA. One of the risks that a speaker verification system could face is its liability to imposture. With the current communication infrastructure lacking strong user identification, impostors aware of legal transactions can interfere and be engaged in a telephone conversation so as to alter or replace the true conversation, or even initiate a conversation and impersonate in it another person. To combat imposture, it is necessary to study imposture techniques and scenarios. In this paper, we implement a system that allows an impostor to start and lead an audio-visual telephone conversation, and sign and exchange data electronically on behalf of another person. During the conversation, audio and video of the impostor are altered in a way as to mimic the other person’s voice and face. On the speech side, there exist processing techniques exploitable by impostors to reproduce the voice of an authorized client. In particular, speech segments obtained from client's recordings can be used to synthesize new sentences that the client never pronounced. We will explain how a very-low bit-rate speech coding system, such as the ALISP-based one, can be adapted to serve forgery purposes, transforming any input speech into client's voice. On the human face side, the imposter’s talking face is detected and facial features are extracted and tracked. Lip movements are used to animate a synthetic talking face (Greta). The texture of the impersonated face is mapped onto “Greta” and coded for transmission over the phone, along with the synthesized voice. Audio-visual coding and synthesis is realized by indexing in a memory containing audio-visual sequences. Stochastic models (coupled HMM) of characteristic segments are used to drive the search in memory.
منابع مشابه
Audio-Visual Identity Verification and Robustness to Imposture
The robustness of talking-face identity verification (IV) systems is best evaluated by monitoring their behavior under impostor attacks. We propose a scenario where the impostor uses a still face picture and a sample of speech of the genuine client to transform his/her speech and visual appearance into that of the target client. We propose MixTrans, an original text-independent technique for vo...
متن کاملA Cantonese Speech-Driven Talking Face Using Translingual Audio-to-Visual Conversion
This paper proposes a novel approach towards a videorealistic, speech-driven talking face for Cantonese. We present a technique that realizes a talking face for a target language (Cantonese) using only audio-visual facial recordings for a base language (English). Given a Cantonese speech input, we first use a Cantonese speech recognizer to generate a Cantonese syllable transcription. Then we ma...
متن کاملReal-time speech-driven face animation with expressions using neural networks
A real-time speech-driven synthetic talking face provides an effective multimodal communication interface in distributed collaboration environments. Nonverbal gestures such as facial expressions are important to human communication and should be considered by speech-driven face animation systems. In this paper, we present a framework that systematically addresses facial deformation modeling, au...
متن کاملReal-Time Speech-Driven 3D Face Animation
In this paper, we present an approach for real-time speech-driven 3D face animation using neural networks. We first analyze a 3D facial movement sequence of a talking subject and learn a quantitative representation of the facial deformations, called the 3D Motion Units (MUs). A 3D facial deformation can be approximated by a linear combination of the MUs weighted by the MU parameters (MUPs) – th...
متن کاملFacial Expression Synthesis Based on Emotion Dimensions for Affective Talking Avatar
Facial expression is one of the most expressive ways for human beings to deliver their emotion, intention, and other nonverbal messages in face to face communications. In this chapter, a layered parametric framework is proposed to synthesize the emotional facial expressions for an MPEG4 compliant talking avatar based on the three dimensional PAD model, including pleasure-displeasure, arousal-no...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004